All Databases MacTech Vol 01-1984/5

Inline Code

Volume Number: 1

Issue Number: 9

Column Tag: Forth Forum

"Inline Code for MacForth

By Jörg Langowski, Chemical Engineer, Fed. Rep. of Germany, MacTutor

Editorial Board

Speeding up Forth with Inline Code

When you use your computer for applications that require a lot of data shuffling

and calculations, work with large arrays and matrices and so on, you tend to become a

little paranoid about speed. Although Forth code is very compact through its threaded

structure, and word execution (i.e. subroutine calling) is reasonably well optimized in

MacForth (see MacTutor V1 No2), I have always felt uncomfortable with the overhead

that goes into the execution of a simple word like DROP, whose 'active part' consists of

one 16-bit word of machine code.

Just as a reminder: when the Forth em executes the token for DROP in a

definition, it calls a subroutine that looks like this:

DROP ADDQ.L #4,A7

JMP (A4)

So it is a simple 4-byte increment of the stack pointer that does the DROP job.

But, then the next token has to be fetched and executed by jumping to the NEXT routine,

whose address is contained in A4, the base pointer. This makes for a several hundred

precent overhead, as compared to the increment itself. This overhead is not so dramatic

with other words, but it is still there: and all in all the Sieve benchmark needs 21

seconds to run in MacForth, compare this to 9 seconds in compiled C (Consulair).

How can we speed up the code? After all, we have complete control over what goes

into the dictionary and could put the machine code that we need right in there, no need

for time-expensive subroutine calling. This is what the Forth 2.0 assembler enables

you to do. However, if you create a piece of code in Forth assembler, it tends to look

much more cryptic than 'normal' assembler, which after all is readable with adequate

documentation.

It would be much nicer if we had a means to create the assembly code that

corresponds to a DROP by writing a similar word, such as %DROP: something like a

macro. No need to worry about which registers to use, and you could use 'almost

normal' Forth code for writing your routine.

It shouldn't be that difficult to persuade the Forth system to execute machine code

that is embedded in a definition. Every Forth word starts with at least one executable

piece of machine code, trap calls for Forth-defined words such as colon definitions and

'real' 68000 code for machine code definitions. However, this gives you either

machine code or Forth, not both. Our goal is to define words that allow switching

between 68000 and Forth code within one definition. Similar words do exist in the

Forth 2.0 assembler, but it lacks a set of macros that allow you to write inline Forth

code instead of assembly code. Furthermore, you cannot define control structures that

easily.

Assume we have Forth code that looks like this:

...

etc. This sequence of instruction will get executed just fine if is a word that

transfers execution to the word just following. We'll call this word >CODE and define it

as follows:

: >CODE

here 2+ make.token w, [compile] [ ;

immediate

This word, which is executed during compilation, takes the next free address in

the dictionary, adds 2 (this is where execution of the machine code is to start) and

compiles this address as a token into the dictionary. Since a token just tells the Forth

interpreter 'jump to the address that I refer to', machine code execution will start at

the address following >CODE.

This is what happens at execution time. At compilation time, the words following

>CODE in the input stream are executed, not compiled (this is what the [COMPILE] [

does). Therefore, if the words following >CODE are macros that stuff assembly code

into the dictionary, you have your inline code right there.

We'll get to those macros in a minute. First, what remains is the problem how to

get out of the machine code. You might recall that all machine-level Forth definitions

finish with a

JMP (A4)

and the NEXT routine, pointed to by A4, gets the next token from the Forth code.

The pointer to the next token is in register A3. Unfortunately, after we executed

>CODE, A3 remained unchanged and still points to the word following the >CODE token.

Which is 68000 code and certainly nothing that the interpreter will swallow.

Therefore we have to reset A3 before we jump back into the Forth interpreter. This is

what the word >FORTH does:

: >FORTH 47fa0004 , 4ed4 w, [compile] ] ;

LEA 4(PC),A3

JMP (A4)

Remember, when >FORTH appears in the input stream, we are still in execution

mode, from the preceding >CODE (unless we mixed things up). So >FORTH gets executed

when used in a definition; it assembles code that loads A3 with the address following the

JMP, then executes the JMP. Then the mode is switched back to regular Forth

compilation again.

Between >CODE and >FORTH we can now place our macros that generate inline

machine code corresponding to Forth primitives. The code for any of the primitives is

Referenced by (3):